opensource.google.com

Menu

Introducing New Open Source Documentation Resources

Wednesday, May 28, 2025

shapes representing pie charts, a circuit board, and text edited with red markings

Today we're introducing two new open source documentation resources for open source software maintainers, a Docs Advisor guide and a set of Documentation Project Archetypes. These tools are intended to help maintainers make effective use of limited resources when it comes to planning and executing open source documentation work.

The Docs Advisor is a guide intended to demystify documentation work, including help picking a documentation approach, understanding your audience and available resources, and how to write, revise, evaluate, and maintain your documentation.

Documentation Project Archetypes are a set of thirteen project field guides. Each archetype represents a different type of documentation project, the problems it can solve, and how to bring the right collaborators together on the project to create great docs.

Origin story

More than 130 open source projects wrote 200+ case studies and project reports as a part of their participation in the Google Season of Docs program from 2019 to 2024. These case studies and project reports represent a variety of documentation projects from a wide range of open source groups. In these wrap-ups, project maintainers and technical writers describe how they approached their documentation projects, capturing many successes and more than a few challenges.

These reports are a treasure trove of lessons learned–but it's unrealistic to expect time-crunched open source maintainers to read through them all. So we got in touch with Daniel Beck and Erin Kissane to chat about ways to help organize and summarize some of these lessons learned.

These conversations turned into the Docs Advisor guide (‘like having an experienced technical writer hanging over your shoulder') and the thirteen Documentation Project Archetypes.

Our goal with these resources was to turn all of the hard-won experience of the Google Season of Docs participants into explicit documentation advice and guidance for open source maintainers.

More about the Docs Advisor

The Docs Advisor guide is intended to demystify the work of good documentation. It collects practices and processes from within technical writing and docs communities and from user experience, information architecture, and content strategy.

  • In Part 1, you'll pick an overall approach that suits the needs of your project.
  • In Part 2, you'll learn enough about your community and their needs to ensure that your hard work will be helping real people.
  • In Part 3, you'll assess your existing resources and pull together everything you need to move quickly and confidently through the work of creating and revising your docs.
  • In Part 4, you'll get to work writing and revising your docs and set yourself to successfully evaluate your work and maintain it.

The Docs Advisor guide also includes a docs plan template to help you accomplish your docs plan work, including:

  • What approach will you take to your documentation work, as a whole?
  • What risks do you need to mitigate?
  • Are there any documents to make or steps to perform to increase your chances of success?

The Docs Advisor incorporates guidance from interviews with open source maintainers and technical writers as well as from the Google Season of Docs case studies, and integrates the Documentation Project Archetypes into the recommendations for maintainers planning docs work.

More about the Archetypes

Documentation Project Archetypes are meant to help you recognize common types of documentation work (whether you're writing a new user guide or replatforming your docs site), the situations in which they apply, and organize yourself to bring the work to completion.

The archetypes cover the following areas:

  • Planning and evaluating your docs: Experiment and analysis archetypes support future docs work, by learning more about your existing docs, your audience, and your capacity to deliver meaningful change.
  • Producing new docs: Creation archetypes make new docs that directly help your audience complete tasks and achieve their goals.
  • Revising and transforming existing docs: Revision archetypes modify existing docs, to improve quality, reduce maintenance costs, and reach wider audiences.
  • Equipping yourself with docs tools and process: Tool and process archetypes adopt new tools or practices that help you make more, better, or higher quality docs.

All of the archetypes are available on GitHub.

The Edit: a secretary bird holding a red pencil and a doc showing copy marked up for editing The Audit: an otter holding an abacus and a red pie-shaped wedge against a background of pie charts and line charts The Factory: robot arms holding a red angled block against a backdrop of abstract circuitry in green and black

Doc tools in the wild

We are excited to share these tools and are looking forward to seeing how they are used and evolve.

Daniel demoed the concept and first completed archetype, The Migration, at FOSDEM 2025 in his talk Patterns for maintainer and tech writer collaboration. He also talked about the work on the API Resilience Podcast episode "Patterns in Documentation."

We hope to get valuable feedback during a proposed Doc Archetypes session at Open Source Summit Europe 2025 (acceptance pending).

We are also excited to be developing some Doc Archetype illustration cards with Heather Cummings — a few previews are already live on The Edit, The Audit, and The Factory.

If you have questions or suggestions, please raise an issue in the Open Docs repo.

By Elena Spitzer & Erin McKean, Google Open Source Programs Office

Transforming Kubernetes and GKE into the leading platform for AI/ML

Wednesday, May 21, 2025

The world is rapidly embracing the power of AI/ML, from training cutting-edge foundation models to deploying intelligent applications at scale. As these workloads become more sophisticated and demanding, the infrastructure required to support them must evolve. Kubernetes has emerged as the standard for container orchestration, but AI/ML introduces unique challenges that push traditional infrastructure to its limits.

AI training jobs often require massive scale, needing to coordinate thousands of specialized hardware like GPUs and TPUs. Reliability is critical, as failures can be costly for long running, large-scale training jobs. Efficient resource sharing across teams and workloads is essential given the expense of accelerators. Furthermore, deploying and scaling AI models for inference demands low latency and faster startup times for large container images and models.

At Google, we are deeply invested in the AI/ML revolution. This is why we are doubling down on our commitment to advancing Kubernetes as the foundational open standard for these workloads. Our strategy centers on evolving the core Kubernetes platform to meet the needs of the "next trillion core hours," specifically focusing on batch and AI/ML. We then bring these advancements, alongside enterprise-grade management and optimizations, to users through Google Kubernetes Engine (GKE).

Here's how we are transforming Kubernetes and GKE:

Redefining Kubernetes' relationship with specialized hardware

Kubernetes was initially designed for more uniform CPU compute. The surge of AI/ML brought new requirements for seamless integration and efficient management of expensive, sparse, and diverse accelerators. To support these new demands, Google has been a key investor in upstream Kubernetes to offer robust support for a diverse portfolio of the latest accelerators, including multiple generations of TPUs and a wide range of NVIDIA GPUs.

A core Kubernetes enhancement driven by Google and the community to better support AI/ML workloads is Dynamic Resource Allocation (DRA). This framework, developed in the heart of Kubernetes, provides a more flexible and extensible way for workloads to request and consume specialized hardware resources beyond traditional CPU and memory, which is crucial for efficiently managing accelerators. Building on such foundational open-source capabilities, GKE can then offer features like Custom Compute Classes, which improve the obtainability of these resources through intelligent fallback priorities across different capacity types like reservations, on-demand, and Spot instances. Google's active contributions to advanced resource management and scheduling capabilities within the Kubernetes community ensure that the platform evolves to meet the sophisticated demands of AI/ML, making efficient use of these specialized hardware resources more broadly accessible.

Unlocking scale and reliability

AI/ML workloads demand unprecedented scale and have new failure modes compared to traditional applications. GKE is built to handle this, supporting up to 65,000 nodes in a single cluster. We've demonstrated the ability to run the largest publicly announced training jobs, coordinating 50,000 TPU chips with near-ideal scaling efficiency.

Critically, we are enhancing core Kubernetes capabilities to support the scale and reliability needed for AI/ML. For instance, to better manage distributed AI workloads like serving large models split across multiple hosts, Google has been instrumental in developing features like JobSet (emerging from earlier concepts like LeaderWorkerSet) within the Kubernetes community (SIG Apps). This provides robust orchestration for co-scheduled, interdependent groups of Pods. We are also actively working upstream to improve Kubernetes reliability and stability through initiatives like Production Readiness Reviews, promoting safer upgrade paths, and enhancing etcd stability for the benefit of all Kubernetes users.

Optimizing Kubernetes performance for efficient inference

Low-latency and cost-efficient inference is critical for AI applications. For serving, the GKE Inference Gateway routes requests based on model server metrics like KVCache utilization and pending queue length, reducing serving costs by up to 30% and tail latency by 60% compared to traditional load balancing. We've even achieved vLLM fungibility across TPUs and GPUs, allowing users to serve the same model on either accelerator without incremental effort.

To address slow startup times for large AI/ML container images (often 20GB+), GKE offers rapid scale-out features. Secondary boot disks allow preloading container images and data, resulting in up to 29x faster container mounting time. GCS FUSE enables streaming data directly from Cloud Storage, leading to faster model load times. Furthermore, GKE Inference Quickstart provides data-driven, optimized Kubernetes deployment configurations, saving extensive benchmarking effort and enabling up to 30% lower cost, 60% lower tail latency, and 40% higher throughput.

Simplifying the Kubernetes experience and enhancing observability for AI/ML

We understand that data scientists and ML researchers may not be Kubernetes experts. Google aims to simplify the setup and management of AI-optimized Kubernetes clusters. This includes contributions to Kubernetes usability efforts and SIG-Usability. Managed offerings like GKE provide multiple paths to set up AI-optimized environments, from default configurations to customizable blueprints. Offerings like GKE Autopilot further abstract away infrastructure management, aiming for the ease of use that benefits all users.
Ensuring visibility into AI/ML workloads is paramount. Google actively supports and contributes to the integration of standard open-source observability tools within the Kubernetes ecosystem, such as Prometheus, Grafana, and OpenTelemetry. Building on this open foundation, GKE then provides enhanced, out-of-the-box observability integrated with popular AI frameworks & tools, including specific insights into workload startup latency and end-to-end tracing.

Looking ahead: continued investment in Open Source Kubernetes for AI/ML

The transformation continues. Our roadmap includes exciting developments in upstream Kubernetes for easily deploying and managing large-scale clusters, support for new GPU & TPU generations integrated through open-source mechanisms, and continued community-driven innovations in fast startup, reliability, and ease of use for AI/ML workloads.

Google is committed to making Kubernetes the premier open-source platform for AI/ML, pushing the boundaries of scale, performance, and efficiency while maintaining stability and ease of use. By driving innovation in core Kubernetes and building powerful, deeply integrated capabilities in our managed offering, GKE, we are empowering organizations to accelerate their AI/ML initiatives and unlock the next generation of intelligent applications built on an open foundation.

Come explore the possibilities with Kubernetes and GKE for your AI/ML workloads!

By Francisco Cabrera & Federico Bongiovanni, GCP Google Kubernetes Engine

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Wednesday, May 14, 2025

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Authors: Elie Bursztein - Distinguished Research Scientist & David Tao - Software Engineer, Applied Security and Safety Research

Simplifying Cross-Provider Model Benchmarking

At InCyber Forum Europe in April, we open sourced LMEval, a large model evaluation framework, to help others accurately and efficiently compare how models from various providers perform across benchmark datasets. This announcement coincided with a joint talk with Giskard about our collaboration to increase trust in model safety and security. Giskard uses LMeval to run the Phare benchmark that independently evaluates popular models' security and safety.

Results from the Phare benchmark that leverages LMEval for evaluation
Example of LMEval running on a multimodal benchmark across two models.

Rapid Changes in the Landscape of Large Models

New Large Language Models (LLMs) are released constantly, often promising improvements and new features. To keep up with this fast-paced lifecycle, developers, researchers, and organizations must quickly and reliably evaluate if those newer models are better suited for their specific applications. So far, rapid model evaluation has proven difficult, as it requires tools that allow scalable, accurate, easy-to-use, cross-provider benchmarking.

Introducing LMEval: Simplifying Cross-Provider Model Benchmarking

To address this challenge, we are excited to introduce LMEval (Large Model Evaluator), an open source framework that Google developed to streamline the evaluation of LLMs across diverse benchmark datasets and model providers. LMEval is designed from the ground up to be accurate, multimodal, and easy-to-use. Its key features include:

  • Multi-Provider Compatibility: Evaluating models shouldn't require wrestling with different APIs for each provider. LMEval leverages the LiteLLM framework to offer out-of-the-box compatibility with major model providers including Google, OpenAI, Anthropic, Ollama, and Hugging Face. You can define your benchmark once and run it consistently across various models with minimal code changes.
  • Incremental & Efficient Evaluation: Re-running an entire benchmark suite every time a new model or version is released is slow, inefficient and costly. LMEval's intelligent evaluation engine plans and executes evaluations incrementally. It runs only the necessary evaluations for new models, prompts, or questions, saving significant time and compute resources. Its multi-threaded engine further accelerates this process.
  • Multimodal & Multi-Metric Support: Modern foundation models go beyond text. LMEval is designed for multimodal evaluation, supporting benchmarks that include text, images and code. Adding new modalities is straightforward. Furthermore, it supports various scoring metrics to support a wide range of benchmark formats from boolean questions, to multi-choices, to free form generation. Additionally, LMEval provides support for safety/punting detection.
  • Scalable & Secure Storage: To store benchmark results in a secure and efficient manner, LMEval utilizes a self-encrypting SQLite database. This approach protects benchmark data and results from inadvertent crawling/indexing while they stay easily accessible through LMEval.

Getting Started with LMEval

Creating and running evaluations with LMEval is designed to be intuitive. Here's a simplified example demonstrating how to evaluate two Gemini model versions on a benchmark:

 Example of LMEval running on a multimodal benchmark across two models.
Results from the Phare benchmark that leverages LMEval for evaluation

The LMEval GitHub repository includes example notebooks to help you get started.

Visualizing Results with LMEvalboard

Understanding benchmark results requires more than just summary statistics. To help with this, LMEval includes LMEvalboard, a companion dashboard tool that offers an interactive visualization of how models stack up against each other. LMEvalboard provides valuable insights into model strengths and weaknesses, complementing traditional raw evaluation data.

LMEvalboard UI allows to quickly analyze how models compares on a given benchmark
LMEvalboard UI allows to quickly analyze how models compares on a given benchmark

LMEvalboard allows you to:

  • View Overall Performance: Quickly compare all models' accuracy across the entire benchmark.
  • Analyze a Single Model: Dive deep into a specific model's performance characteristics across different categories using radar charts and drill down on specific examples of failures
  • Perform Head-to-Head Comparisons: Directly compare two models, visualizing their performance differences across categories and examine specific questions where they disagree.

Try LMEval Today!

We invite you to explore LMEval, use it for your own evaluations, and contribute to its development by heading to the LMEval GitHub repository: https://github.com/google/lmeval

Acknowledgements

LMEval would not have been possible without the help of many people, including: Luca Invernizzi, Lenin Simicich, Marianna Tishchenko, Amanda Walker, and many other Googlers.

.